Artificial intelligence-based classification of cardiac autonomic neuropathy from retinal fundus images in patients with diabetes: The Silesia Diabetes Heart Study

Background Cardiac autonomic neuropathy (CAN) in diabetes mellitus (DM) is independently associated with cardiovascular (CV) events and CV death. Diagnosis of this complication of DM is time-consuming and not routinely performed in the clinical practice, in contrast to fundus retinal imaging which is accessible and routinely performed. Whether artificial intelligence (AI) utilizing retinal images collected through diabetic eye screening can provide an efficient diagnostic method for CAN is unknown. Methods This was a single center, observational study in a cohort of patients with DM as a part of the Cardiovascular Disease in Patients with Diabetes: The Silesia Diabetes-Heart Project (NCT05626413). To diagnose CAN, we used standard CV autonomic reflex tests. In this analysis we implemented AI-based deep learning techniques with non-mydriatic 5-field color fundus imaging to identify patients with CAN. Two experiments have been developed utilizing Multiple Instance Learning and primarily ResNet 18 as the backbone network. Models underwent training and validation prior to testing on an unseen image set. Results In an analysis of 2275 retinal images from 229 patients, the ResNet 18 backbone model demonstrated robust diagnostic capabilities in the binary classification of CAN, correctly identifying 93% of CAN cases and 89% of non-CAN cases within the test set. The model achieved an area under the receiver operating characteristic curve (AUCROC) of 0.87 (95% CI 0.74–0.97). For distinguishing between definite or severe stages of CAN (dsCAN), the ResNet 18 model accurately classified 78% of dsCAN cases and 93% of cases without dsCAN, with an AUCROC of 0.94 (95% CI 0.86–1.00). An alternate backbone model, ResWide 50, showed enhanced sensitivity at 89% for dsCAN, but with a marginally lower AUCROC of 0.91 (95% CI 0.73–1.00). Conclusions AI-based algorithms utilising retinal images can differentiate with high accuracy patients with CAN. AI analysis of fundus images to detect CAN may be implemented in routine clinical practice to identify patients at the highest CV risk. Trial registration This is a part of the Silesia Diabetes-Heart Project (Clinical-Trials.gov Identifier: NCT05626413). Graphical Abstract Supplementary Information The online version contains supplementary material available at 10.1186/s12933-024-02367-z.


Introduction
The International Diabetes Federation estimates that globally ~ 10.5% of adults have diabetes mellitus (DM) which is predicted to more than double by 2050 [1].Cardiac autonomic neuropathy (CAN) is a common but yet underdiagnosed microvascular complication of DM [2].The prevalence of CAN increases with DM duration in both type 1 DM (T1DM) and type 2 DM (T2DM) with up to 30% [2,3] and 60% after 20 years being diagnosed and after 15 years of DM duration [4] respectively.
Early detection of CAN is pivotal due to its role as an independent risk factor for CV (cardiovascular) mortality, arrhythmias, silent myocardial ischemia, with ~ 3 × relative risk for CV events and mortality [5,6].Current diagnostic modalities for CAN are limited, with the majority being labour-intensive cardiac autonomic reflex tests [7].These tests, given their intricate nature, are not routinely employed in standard clinical practice.Hence, in clinical practice the majority of diagnoses occur with the development of symptoms which indicates advanced endstage CAN with almost total autonomic denervation [7].
Artificial intelligence (AI) is becoming more widely used for diagnosis with recent advances demonstrating connections between different patients' phenotypes, features, parameters and body examination images [8].Recently, we have developed AI-based deep learning algorithm utilizing corneal confocal microscopy images of the subbasal nerve plexus to detect the peripheral neuropathy with excellent accuracy [9].AI-enabled retinal imaging can predict circulatory mortality, myocardial infarction and stroke [10] and more recently neurological disorders [11].
Colour fundus photography of the retina is easy to perform and used in routine annual diabetic retinopathy (DR) screening.However, retinal imaging may detect systemic complications beyond that of DR, for example, imaging of arterioles and venules of the retina may serve as an indicator of circulatory disease [12], even with the use of AI techniques [13].Deep learning-based algorithms implementing retinal images has been utilised with good performance in identifying patients with diabetic peripheral neuropathy (DPN) [14].
To date, no study has developed and validated AI based algorithm with the use of retinal images to diagnose CAN as well as to further differentiate between early (eCAN) and definite or severe involvement of CAN (dsCAN).Based on previous studies which proved efficient in identification of patients with CVD or DPN combining the retinal imaging with an AI-based deep learning algorithms we hypothesised that AI could be a potent tool to diagnose CAN in patients with DM.

Methods
This is a single-centre, observational, study conducted in a cohort of patients with DM admitted to the Department of Internal Medicine and Diabetology, Clinic Hospital no 1 in Zabrze, Poland, and outpatients from the Diabetology Clinics in the Silesia Region, Poland, from October 6, 2021 until July 9, 2023 who fulfilled the inclusion and exclusion criteria.This is a part of the Silesia Diabetes-Heart Project (ClinicalTrials.govIdentifier: NCT05626413).Participant gave informed consent before any assessments took place.The study aligned with the Declaration of Helsinki standards, and was approved by the Ethics Committee of the Medical University of Silesia (KNW/0022/KB1/10/17).

Inclusion and exclusion criteria
The following are the inclusion criteria for the study: (i) informed consent; (ii) diagnosis of T1DM for at least 5 years or T2DM for any duration prior to enrolment; (iii) age at least 18 years old.Exclusion criteria consisted of: proliferative retinopathy, any severe and acute illness, disabled and bedridden patients, solid organ transplant, previously diagnosed causes of neuropathy other than DM, pregnancy, alcohol use disorder, severe hypoglycaemia in the past 24 h.

Medical history
Demographic data were reported by the patients and comorbid conditions were ascertained by documented medical history.Chronic kidney disease was defined as either estimated glomerular filtration rate (eGFR) < 60 ml/min per 1.73m 2 or urine albumin to creatinine ratio (UACR) ≥ 30 mg/g for a duration of at least 3 months.DR grading was performed by experienced ophthalmologist based on fundus images collected in this study.Laboratory data were obtained during hospital stay or visit in outpatient clinic at the time of CAN and color fundus examination.

CAN diagnosis
The CAN was diagnosed based on the Toronto consensus panel criteria with the use of cardiovascular autonomic reflex tests (CARTs), which are considered gold standard in CAN testing [2].For CARTs assessment, DiCAN (Diabetic Cardiac Autonomic Neuropathy; Medicore, Seoul, South Korea) was utilized.Patients underwent CARTs in the morning (between 8:00 and 12:00 a.m.) in the same a consistently lit and quiet room.The heart rate responses to deep breathing test (expiration/inspiration ratio), lying-to-standing test (30:15 supine to standing ratio) and the Valsalva manoeuvre were evaluated.Furthermore, the assessment of orthostatic hypotension was performed by comparing blood pressure measurements before and after standing.If any of the CARTs tests results were missing, the patient was excluded from the analysis.The CAN was staged based on the number of abnormal CARTs results, using the following criteria: (i) early eCAN-one abnormal CART; (ii) definite CANat least two abnormal CARTs; (iii) severe CAN-at least two abnormal CARTs accompanied by orthostatic hypotension.Patients with advanced stages of CAN namely definite or severe CAN were grouped together for further analysis and labelled dsCAN.A detailed table describing the CARTs and their cutoffs for abnormal values has been displayed in Supplementary Table S1.

Retinal images acquisition and processing
Retinal imaging was performed with the use of DRSplus device (Centervue, Padua, Italy) to capture five partially overlapping fields in each non-dilated eye, namely the central, nasal, temporal, superior, and inferior fields.This enabled an effective viewing angle of retina of up to 80 degrees.In the analysis, color fundus images were taken in individuals without CAN, with eCAN and with dsCAN.The study population was randomly split into training set (60%), validation set (10%) and test set (30%) on the patient-level to avoid data leakage.

An exploratory analysis and model refinement
In an exploratory analysis, we initially incorporated retinal images from patients who had undergone laser therapy.It became evident, that the inclusion of these post-therapeutic images introduced a degree of noise into the diagnostic model.To enhance the model's performance, these patients were subsequently excluded from the analysis (eight participants).We have performed final AI analysis utilising binary models in the two experiments.One of them aimed to differentiate patients with any stage of CAN from those without CAN and the second experiment aimed to differentiate patients with dsCAN from the group of patients with eCAN or no CAN.

AI model-multiple instance learning
An AI based Multiple Instance Learning method (MIL) [15] was adopted for tackling the different number of colour fundus images per patient.Such MIL took all fundus images for each patient into account during AI model training, thus leading to a generalizable and explainable framework pipeline.MIL represents a type of weakly supervised learning where training examples are organized into groups known as "bags", and a single label is assigned to the entire bag.In our context, we define a "bag" as a collection of multiple colour fundus images belonging to a single patient, and each individual image within the bag is referred to as an "instance, " following the conventional MIL framework.We align the label of the bag with the labels of its constituent instances, meaning that all instances within the same bag share the same label and are considered indicative of that label.However, it's worth noting that since some colour fundus images may lack the distinctive features necessary for the backbone models to extract features and accurately predict the patient's classification, there is a chance of introducing label noise, especially when considering negative bags.In such cases, it becomes imperative to incorporate the instances that truly contribute to the bag's label, i.e., those instances whose actual hidden labels align with the bag's true label.In this context, "discriminative" implies that an instance's genuine, underlying label matches the actual label assigned to the bag.To address this, a robustness selection process is introduced into the MIL pipeline, where only the robust instances under adversarial noise's perturbation are selected for feature learning in each model training epoch [15].More details on the classic MIL algorithm, our proposed robustness-aware MIL process, the mathematical formulation, and the Shannon Entropy metric used for robustness quantification are provided in the Supplementary Material.

Implementation detail
The original fundus image size was 3600 × 2910 pixels; a bilinear interpolation resize operation was applied to resize the image into 224 × 224 pixels for efficient model training.An on-the-fly data augmentation method was adopted, specifically, we adopted random rotations and horizontal/vertical flips to the training data with a probability of 0.3.These rotations ranged from − 30 to 30 degrees.The neural network underwent end-to-end training for 400 epochs, starting with a learning rate of 1e-4 and following a cosine decay schedule.We utilized the Adam optimizer with a batch size of 48 and employed the standard Cross Entropy loss function to optimize the entire training process.All training procedures were carried out on a workstation equipped with four GEFORCE RTX 3090 24GiB GPUs.Subsequently, all testing experiments were performed on a local workstation featuring an Intel(R) Xeon(R) W-2104 CPU and a Geforce RTX 2080Ti GPU.

Performance metrics and attribution maps
To assess the classification performance of the model, various test accuracy statistics were employed, including sensitivity, specificity, F1 score, precision, and the receiver operating characteristic (ROC) curve alongside the area under the ROC curve (AUROC).The precision in our findings was bolstered through the computation of 95% confidence intervals; specifically utilizing De Long's method for AUROC and utilizing a 2000 sample bootstrap approach for determining the confidence intervals for sensitivity, specificity, F1 score, and precision.
In this study, we implemented the Grad-CAM attribution technique, which leverages the gradients directed into the last convolutional layer to construct a primary attribution map.This map highlights the crucial areas in the image predominantly influencing the classification outcome [16].To augment this, we combined Grad-CAM with fine-grained details of the image, to get a Guided Grad-CAM.This refined visualization method enabled the creation of high-resolution, class-discriminative visualizations, shedding greater light on the elements vital for classification.

Results
From a primary eligible population of 298 patients, a total of 229 patients were included in the final analysis.The reasons for exclusion were the lack of consent for fundus and CAN examination (n = 23), no retinal imaging (n = 21), missing or incomplete CAN testing (n = 17).Patients with a history of retinal laser therapy (n = 8) were excluded after initial analysis.The study flowchart is presented in Fig. 1.Among analysed patients, 109 (45%) presented with any stage of CAN of whom 38 (17%) were diagnosed with dsCAN and 66 (29%) with eCAN.Demographic and clinical characteristics of studied participants are summarised in Table 1.Patients diagnosed with CAN were older, predominantly with type 2 DM and had a higher number of comorbidities.Patients with advanced stages of CAN had a longer duration of DM with higher rate of coronary artery disease and lower eGFR.The distribution of participants and images into training, validation, and test sets, along with a breakdown by CAN status, is delineated in Supplementary Tables S2  and S3, respectively.

Model performance for the classification of CAN
For the binary classification to either no CAN or any stage of CAN (early, definite or severe), ResNet 18 achieved the best performance overall, correctly classifying 25 out of 27 (93%) patients with CAN and 25 out of 28 (89%) without CAN within the test set.Confusion matrix is shown in Table 2 and performance metrics of the model are reported in the Table 3.The model achieved an AUCROC of 0.87 (95% CI 0.74-0.97)(Fig. 2).Performance of the models utilizing alternative network backbones can be found in the Supplementary Material (Supplementary Table S4 and Figure S1).

Model performance for the classification of definite or severe stages of CAN
In the experiment of the binary classification which aimed at detecting dsCAN from those with eCAN or without CAN, the ResNet 18 model correctly identified 7 out of 9 (78%) patients with dsCAN and 43 out of 46 (93%) patients either without CAN or with eCAN in the test set.Table 4 outlines the confusion matrix, and the performance metrics can be reviewed in Table 3.The ResNet 18 model attained AUCROC of 0.94 (95% CI 0.86-1.00),as demonstrated in Fig. 3. Model based on the ResWide 50 achieved better sensitivity, correctly classifying 8 out of 9 patients (89%) with dsCAN, but with poorer AUCROC of 0.91 (95% CI 0.73-1.00)(Table 3).Performance of the alternative model backbones can be found in Supplementary Figure S2 and Table S5.

Attribution maps
Figure 4 displays retinal images from the test set that were correctly identified by the ResNet 18 model, alongside the corresponding Grad-CAM and Guided Grad-CAM visualizations.The attribution maps for correctly classified patients without CAN highlighted the macula and optic disc.Conversely, in subjects with dsCAN, the optic disc was the primary focus of the attribution maps.Additionally, the Guided Grad-CAM visualizations were notable for their emphasis on the retinal vasculature, a feature that was more pronounced in the peripheral retinal fields, as presented in Supplementary Figure S3.

Discussion
The main findings of our study are as follows: (i) The development of a novel AI-based deep learning algorithm(s) which detect CAN and classify its severity; (ii) Our binary AI algorithm achieving good sensitivity with excellent specificity for CAN detection; and (iii) a further AI-based deep learning algorithm (DLA) to detect severe CAN manifestations with high sensitivity and specificity.As far as we are aware, this is the first study that provides a solution to the problematic Our AI model performs classification without the need for expert annotation, remedying operator bias and importantly utilising DR screening to diagnose a distinct microvascular complication which is not routinely screened.Importantly, previous research utilising clinical data demonstrates that AI using data driven machine learning approaches achieves outstanding performance for the prediction of CAN (AUC: 0.96 [95% CI 0.94-0.98])and an accuracy of 87%, and sensitivity of 87% [17].
In the present study, using fundus imaging, our models had excellent diagnostic ability to identify patients with CAN (sensitivity 0.93, specificity 0.89, AUC 0.87) and those with definite or severe CAN presentations (sensitivity 0.78, specificity 0.93, AUC 0.94).
AI has been extensively researched in the detection of DR, specifically referrable given the current increase in the global prevalence of visual impairment and also the associated health economic costs of expert graders to identify early DR in retinal fundus images [18].Indeed, AI-based algorithms were first utilized in US for DR screening and are established in the detection of referable DR [19].Subsequently, preliminary AI-based algorithms have been developed with the aim to predict favorable outcomes with anti-VEGF injections [20].Microangiopathy is a cardinal pathophysiological feature of DM complications including DR and DPN.As such vasculopathy of DR noted on fundus images may serve as a biomarker for risk stratification of other diabetes-related microvascular complications [21].It has been demonstrated that retinal microvasculature analysis using AI predicts: (1) a number of CVD (cardiovascular disease) risk factors including DM and hypertension, (2) direct CVD events including CV mortality, and (3) CVD biomarkers such as coronary artery calcium score [22].Recently, Mordi et al. [23] have demonstrated that retinal parameters alone and in combination with genome-wide polygenic risk score for coronary heart disease have independent and incremental prognostic value compared with traditional CV risk assessment in T2DM.
Lee et al. [24] successfully presented AI-based models which predicted the risk of DPN through the development of four deep learning architectures (InceptionNet, VGGNet, ResNet, and ConvMixer) using fundus photography images.The combined sensitivity values of disease    CAN Cardiac autonomic neuropathy severity stratified DPN reached 0.84, 0.90, 0.90, and 0.92 (mild-moderate-severe) and thus demonstrated that the AI-based models were able to determine the presence of DPN and its associated severity [24].We have previously demonstrated the use of AI to diagnose DPN through the use of corneal nerve images.In a seminal study, Williams et al. [25] segmented corneal nerves from corneal confocal microscopy images and used U-Net models to demonstrate superior segmentation and classification performance compared to other non-AI based automated models.Subsequently, Preston et al. [9] developed a DLA using ResNet backbone architecture to detect peripheral neuropathy (in DM and prediabetes) and similar to the current study it achieved excellent classification accuracy using end-to-end classification (without the need for segmentation).We further developed this model into a clinically translatable binary classification system of no DPN vs DPN present [26].
Color fundus photography is an established, readily available diagnostic tool primarily deployed for DR screening.The American Diabetes Association (ADA) advocates for biennial screening for individuals with DM who exhibit no signs of DR and endorses the use of retinal photography in screening programs [27].This established infrastructure for fundus imaging presents a favourable framework for extending its utility to the detection of additional diabetic complications, offering a cost-effective approach without necessitating supplementary testing procedures.Our AI-driven model processes multiple retinal images per subject, encompassing the peripheral retinal fields, which may capture nuanced vascular changes indicative of systemic complications.The rationale for using retinal imaging in conjunction with DLA for the purpose of neuropathy diagnosis is underpinned by findings from Neriyanuri et al. [28] Their study revealed that DPN manifests in the retina as structural and functional impairments, despite the absence of DR.This is characterized by increased foveal thickness and reduced retinal nerve fibre layer thickness, alongside declines in various visual functions.Choi et al. [29] further support this by demonstrating significant associations between inner retina thickness and cardiovascular autonomic dysfunction.Their study found that eyes with retinal nerve fibre layer defects had significantly thinner ganglion cell-inner plexiform layer thicknesses,   It is well recognized that diabetic microvascular disease complications cluster, and as such, CAN is more likely to occur when diabetic retinopathy is present.Notably, our study demonstrated no differences in rates of DR across patients without CAN and differing severity of CAN.Thus, suggesting that retinal changes associated with neuropathies may have distinctive features independent of the classic DR markers.While the subtle structural changes related to CAN are not readily discernible to human clinicians through traditional fundus examination, our study illustrates the potential of AI in unveiling these hidden patterns.There are no direct visual features for diagnosing CAN through retinal images in current clinical practice.However, the high accuracy of our AI models suggests that these computational techniques can identify minute retinal alterations beyond human visual capability.Previous research has also effectively utilized machine learning methodologies in conjunction with fundus photography to yield appreciable diagnostic accuracy for DPN.For instance, Benson et al. employed a support vector machine classifier, attaining a sensitivity of 78% and a specificity of 95% in identifying DPN [30].
We applied the Toronto criteria for the diagnosis of CAN, which are based on CARTs and recognized as a gold-standard in CAN diagnostics [2].This ensures a robust diagnostic framework and strengthens the validity of our model's capability to accurately identify CAN.The present investigation utilized a modestly sized cohort (N = 229), which, while leading to wide CIs, still attained reasonable level of accuracy in classification.Augmenting the model with a clinical and demographic data could potentially enhance its diagnostic precision and facilitate the creation of a three-tier classification system that would detect CAN and differentiate its severity within one model.Validation of this AI-driven DLA is necessary in a larger cohort and, subsequently, prospectively within a broader clinical setting.Upon successful validation, it will be crucial to develop cost-effectiveness frameworks to evaluate the potential economic implications of its healthcare application.

Limitations
The study included a relatively small number of patients but remains a novel proof of concept for a wider clinical research.This was also a single centre study which requires further validation in a more heterogeneous populations of patients to test its performance in real-world clinical deployment.Future modification of the model by including clinical and demographic data may improve the diagnostic performance although currently achieved excellent classification accuracy.In contrast to conventional AI model training approaches for 2D images, the Multiple Instance Learning mechanism typically requires a longer duration for training.For instance, when considering a relatively limited training dataset, our AI model necessitates an average of 48 h for 400 epochs of training using various backbones.Nonetheless, we consider the training process for the model as a one-time event, whereas the speed of inference holds greater significance in assessing the algorithm's performance and its applicability in real-world scenarios.Remarkably, our model achieves an inference time of approximately 0.19 s per patient on average, producing accurate diagnostic results.
In conclusion, AI-based algorithms utilising retinal images can differentiate with high accuracy patients with CAN.AI analysis of fundus images to detect CAN may be implemented in routine clinical practice to identify patients at the highest CV risk, however external validation of our findings and algorithm optimization in a prospective clinical study are required.

Fig. 1
Fig. 1 Flowchart of study participants.CAN Cardiac autonomic neuropathy, CARTs Cardiovascular autonomic reflex tests

Fig. 3 Fig. 2
Fig. 3 ROC curve of the binary model classifying to no CAN/eCAN or dsCAN

Fig. 4
Fig. 4 Attribution map results from ResNet 18. Example images from patients without CAN, with eCAN or dsCAN.First row, original images, second row, Grad-CAM images, third row, Guided Grad-CAM images

Table 1
Baseline characteristic of study participants

Table 2
Confusion matrix for the ResNet 18 model differentiating patients with CAN and those without CAN

Table 3
Performance metrics of the two experiments utilising binary models differentiating patients with any stage of CAN from those without CAN and those with eCAN or no CAN from those with dsCAN AUC area under the curve, CAN Cardiac autonomic neuropathy, CI confidence interval, dsCAN definite or severe CAN, eCAN early CAN * 95% CI

Table 4
Confusion matrix for the ResNet 18 model differentiating patients with dsCAN from those with eCAN or without CAN CAN cardiac autonomic neuropathy, dsCAN definite or severe CAN, eCAN Early CAN